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author 

Frederick 
Hoenisch 



subject: Enabled Rdb AIJs causes BAS-F-MEMMANVIO in Povw 
call to External Product 

Dec 24, 2004 16:56:16 GMT 

After waiting for a patch and upgrading to Oracle's Rdb product to v 
241 , we re-enabled After Image Journaling (AIJ) on one of our data 

When users perform a particular function (called an assessment) fn 
PowerHouse (PH) screen they would intermittantly get a stackdumr. 
"BAS-F-MEMMANVIO, memory management violation". This probli 
to increase in frequency as system loading increases. 

After some investigating/head scratching we tried turning off AIJs (c 
left that we felt we changed) and the problem stopped. 

We've recompiled the PowerHouse code which links to an external 
(address/postal code checking software) and get the same problem 

We've opened calls with the three vendors for advise: 
COGNOS (PH) 
ORACLE (Rdb) 
COMDATA (PC Lookup) 

COMDATA provided me with some source code with some additior 
checking within the code to help us narrow down the problem. I reo 
under BASIC 1.5 with the /CHECK=ALL qualifier. The resulting .OB 
recompiled into the PowerHouse routine and our latest attempt to ti 
yielded the following: 

%BAS-F-MEMMANVIO, Memory management violation 
-BAS-I-USEPC_PSL, at user PC=80C981A4, PSL=0000001B -SYJ 
ACCVIO, access violation, reason mask=04, virtual address=0000C 
-BAS-I-FROLINSUB, from line 2999 in subprogram MMPREP %TR 
TRACEBACK, symbolic stack dump follows 
image module routine line rel PC abs PC 
0 FFFFFFFF80C9A984 FFFFFFFF80C9A984 
DEC$BASRTL 0 0OOO0OOOO0O0EF9C 000000007C1D8F9C 

— above condition handler called with exception OOO00OOC: %SY 
ACCVIO, access violation, reason mask=04, virtual address=0000C 

— end of exception message 

0 FFFFFFFF800A609C FFFFFFFF800A609C 
0 FFFFFFFF80C981A4 FFFFFFFF80C981A4 
QKDRIVER MMPREP MMPREP 1218 0000000000005834 
00000000001 122D4 
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0 FFFFFFFF80C9A894 FFFFFFFF80C9A894 QKDRIVER DRIVEF 
call_external 

7563 0000000000000664 00000000000506C4 QKDRIVER DRIVE 
driverjnainline 

1318 0000000000000048 0000000000050048 
0 FFFFFFFF8026FE94 FFFFFFFF8026FE94 



Here's an extract of the MMPREP routine from line 2999 (provided 
permission from COMDATA): 



2999 EXTRA_INFO$ = EDIT$(EXTRA_WORK$,8% + 128%) ! STR 
LSET MMISER_OUTREC_NONADDRESS = EXTRA_INFO$ 
IF TRACE$ = V THEN 

PRINT 'LEAVING MMPREP ';MMISER_OUTREC__NONADDRESS 
PRINT' ADDRESS 1 ';MMISERJNREC_ADDRESS1 
PRINT 'ADDRESS 2 ';MMISER_INREC_ADDRESS2 
END IF 



IF EXTRA_INFO$ <> SP THEN 

LSET MMISER_OUTPAR_ERRORTEXT(0%) = '86 Extra Informal 
ELSE 

IF EDIT$(MMISER_INREC_ADDRESS1 + ' 1 + & 
MMISER_INREC_ADDRESS2,16%+32%+128%) <> & 
EDIT$(ORIG_ADDR1$,16%+32%) 
THEN 

LSET MMISER_OUTPAR_ERRORTEXT(0%) = '80 Abbreviation' 
END IF 
END IF 



! RESET PASSING PARAMETERS 



INPARS = MMISER_INPAR_WHOLE 
INREC$ = MMISERJNRECWHOLE 
OUTPAR$ = MMISER_OUTPAR_WHOLE 
OUTREC$ = MMISER_OUTREC_WHOLE 



GOTO 32767 



Note: We've checked quotas as well and they appear to be adequa 
reported from PQUOTA utility). 

Typical user's quotas are as follows: 
Maxjobs: 0 Fillm: 8192 Bytlm: 1500000 
Maxacctjobs: 0 Shrfillm: 0 Pbytlm: 0 
Maxdetach: 0 BlOlm: 100 JTquota: 4096 
Prclm: 6 DIOlm: 200 WSdef: 1024 
Prio: 4 ASTIm: 206 WSquo: 20000 
Queprio: 0 TQEIm: 50 WSextent: 70000 
CPU: (none) Enqlm: 32767 Pgflquo: 1500000 

Any thoughts/comments would be appreciated. 

VMS 7.3-2 
Rdb 7.1-241 
PH7.10G1 

The PC_LOOKUP product hasn't had any program updates for a ni 
years. 

Yours truly, 
Fred. 
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Note: If you are the author of this question and wish to assign points to any of the answ« 
login first.For more information on assigning points .click here 



Sort Answers By: Da 



Robert Gezelter 
★ 



Frederick 
Hoenisch 



Robert Gezelter 



Robert Gezelter 
★ 



Garry Fruth + 



Dec 24, 2004 17:03:47 GMT 0 pts 



Fred, 

Does the program fault in the same place every time? 

- Bob Gezelter, http://www.rlgsc.com 

Dec 24, 2004 17:08:24 GMT N/A: Question Author 

Yes, but because it is a production application, we didn't have man] 
opportunities to test, before disabling AIJs. 

Of the failures - all were at the same point. 
Dec 24, 2004 17:25:48 GMT 6 pts 

Fred, 

Personally, I would probably set things up so that I could single ste| 
program at a machine code or source code level to determine EXA* 
call, and what parameter it is having a problem with. As a start, che 
at the address 80C981 A4, which would appear to be in system spa 
glance, it seems unlikely that an AIJ issue would produce a synchrc 
problem, one that occurs at exactly the same place every time. 

I would really want to get the debugger on it and see exactly which 
to which routine is causing the problem. Otherwise, we are working 
speculation, not facts. 

- Bob Gezelter, http://www.rlgsc.com 
Dec 24, 2004 17:27:54 GMT 6 pts 

Fred, 

A followup thought. A quota related issue could produce a timing in 
problem. In any event, the debugger would allow precise identificati 
problem 

- Bob Gezelter, http://www.rlgsc.com 
Dec 24, 2004 17:58:24 GMT 7 pts 

I suggest you add a line number to every statement; this may help 1 
to a single statement. E.G. 



2999 EXTRA_INFO$ = EDIT$(EXTRA_WORK$,8% + 128%) ! STR 

29991 LSET MMISER_OUTREC_NONADDRESS = EXTRAJNFO 

29992 IF TRACES = 'V THEN 

29993 PRINT 'LEAVING MMPREP 'iMMISERJDUTREC^NONADC 

29994 PRINT * ADDRESS 1 ';MMISER_INREC_ADDRESS1 

29995 PRINT 'ADDRESS 2 ';MMISERJNREC_ADDRESS2 

29996 END IF 



Line numbers do not need to be sequential nor in order; but they dc 
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Frederick 
Hoenisch 



Volker Halle 



unique. The compiler should let you know about duplicates. 

I suspect the accvio occurs in the four lines that "RESET PASSING 
PARAMETERS". If the calling program passed fixed-length string b 
rather than using dynamic strings (1 think my terminology may be a 
this), then changing the length of what is passed may not be legal. 

Dec 24, 2004 19:47:54 GMT N/A: Question Author 

Thank you for the responses thus far. I failed to mention that our at! 
duplicate the problem in our non-production environments all failed. 

Because the problem only shows itself when the system is busy, I Y 
discuss with the Application Manager the possibility of trying to deb 
problem during prime time (this is unlikely). 

PowerHouse has a debugger, but we're not sure (at this point) if it v 
debug the external calls (BASIC code)? Something to try and expei 
over the holidays I guess. 

Dec 25, 2004 1 1 :50:02 GMT 7 pts 
Fred, 

consider to issue a SET PROC/DUMP/ID=xxx against a process ru 
BASIC image, before invoking the problematic user function. Once 
ACCVIO happens, you'll get a process dump (SYS$LOGIN:imagen 
which can be analyzed by ANAL/PROC. You can do analysis of the 
a non-realtime environment first. You can find the failing instruction 
stack leading to the problem and look at the memory contents beinj 
etc. 

To make sure, that a complete process dump (including process-pe 
from system space) can be written, you may want to (temporarily) g 
IMGDMP$READALL right to the user running the application. 

Trying to analyse the problem in 'real-time' may be different, especi 
ACCVIO is dependant on system load. You'll never know, whether 
problem. It may even be impossible to reproduce it with running the 



Volker Halle 



Volker. 

Dec 25, 2004 12:51 :47 GMT 7 pts 



Fred, 

could you also try the following, please: 

If we believe the PC value reported by BAS-I-USEPC_PSL, you cai 
following SDA command on the system, where this problem had ha 

$ ANAL/SYS 

SDA> EXA/INS 80C981A4 

As the RM (Reason Mask) has been reported as 04 (=WRITE), the 
DESTINATION address pointed by the instruction producing the AC 
must point to non-writeable memory. 

SDA> MAP 80C981A4 



may also tell, which execlet/image/library this code is in. 

If the reported instruction, registers etc. make sense, try SDA> EX^ 
80C981A4-20;20 to find out, where the 'invalid' value (probably 000 
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which is the failing VA reported in the ACCVIO) in the register may 
loaded from. 



Volker. 

Volker Halle ~ 



<^ Dec 25, 2004 19:22:04 GMT 7 pts 



Fred, 



from experimenting with a little BASIC test program: 

0 FFFFFFFF800A609C FFFFFFFF800A609C «< 
SYS$CALL_HANDL_C+0002C 

0 FFFFFFFF80C981A4 FFFFFFFF80C981A4 «< failing instructioi 
QKDRIVER MMPREP MMPREP 1218 0000000000005834 
00000000001 122D4 



Source code line 1218 in the MMPREP module seems to be the las 
executed in this module before calling a routine in system space, w 
the ACCVIO at PC 80C981 A4. Finding this line requires a source o 
from the version of the module running at your site. 

PC 1 122D4 must be a return address (following a JSR instruction). 

Taking a process dump from a failing BASIC program works fine. \A 
process dump, you have all the information you need to figure out, « 
which instruction is failing due to which invalid address in which reg 



David Sneddon 



Frederick 
Hoenisch 



Volker. 

Dec 26, 2004 01:37:20 GMT 7 pts 



Fred, 

As a longtime user of BASIC, my first instinct on 
the info you supplied would be to increase pgflquo 
(try doubling it). 

Regards 
Dave 

Jan 4, 2005 19:49:46 GMT N/A: Question Author 
Thanks again All: 

The next opportunity to test is this weekend and the first 'load' test i 
morning. 

I'll give it a go and let you know. Thanks for the advice. For this nex 
intend to: 

1 . Double PGFLQUO for all users of the app. 

2. Generate PROCESS dump files. 

3. Add more line numbers to MMPREP. 



Ian Miller 



Yours truly, 
Fred. 

Jan 5, 2005 10:11:01 GMT unassigned 



could there be an issue with the location of the AIJ which is badly h 
leads to the ACCVIO. Is the AIJ file placed on a disk on which there 
free space and that the file and the file and directory protection are 



David Sneddon Jan 5, 2005 10:34:35 GMT unassigned 
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I think the AIJ stuff probably requires some memory 
that is blowing the pgflquo value. I have seen our 
developers make "small" changes like allocating a 
new 10000 element string array and suddenly the 
application falls over with ACCVIO errors. Increasing 
pgflquo makes it go away. The fact that Fred can't 
reproduce it in a test environment suggests that 
the quotas in the test environment are probably 
different to the production environment. 

Dave 



Ian Miller 




Jan 5, 2005 10:39:53 GMT unassigned 



Dave, you are correct that quotas are the first thing to suspect. I wa 
wondering about other possible causes. 

Jan 5, 2005 10:54:35 GMT unassigned 



Yes it may be something else but all MEMMANVIO 
errors I have seen over the years have been either 
pgflquo issues or recursive/circular calls not 
terminating (ending up in stack overflows). 
Failure to create/rename files due to insufficient 
contiguous space for directories usually manifest 
as ACP type errors. 

Dave 

Jan 5, 2005 10:58:30 GMT unassigned 



Having just re-read the original and the comment 
about things breaking when the load increases, 
is it possible that there is insufficient pagefile 
available? Are there now more users than there 
used to be? 

Jan van den Ende Jan 5, 2005 12:20:51 GMT unassigned 




Fred, 



Having read and re-read the whole stream, I am still quite in doubt > 
is to the point for you or not, but since I found no definitive argumer 
out, I will just relate our expirience (and workaround). 

We first stumbled on it when we tried to implement Clusterwide Log 

In ANY area definition in Rdb (and also DBMS), internally the logics 
are evaluated up to the level where a physical or a Concealed Devi 
A Concealed device is then tested to be valid - explicitly - in 
LNM$SYSTEM_TABLE; EXECUTIVE mode. 
Should somehow a reference be made to a data area that does NC 
then an Access Violation is generated. How that would trickle down 
Powerhouse is unknown to me. 

Oracle was able to confirm our findings (back in febrary 2000), and 
support for clusterwide tables 'in the next release'. We have since b 
it not yet repaired in any following release... 

As I started this post with, it might have nothing to do with your prot 
might well be worth checking. 



David Sneddon 



David Sneddon 
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Proost. 

Have one on me. 
Jan 
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Ads by Goooooogle 



Wed, 17 Mar 2004 14:20:43 -0500 Vathix <vathix xx dprogramming.com> 
writes : 



Object o; 
assert(o); 



Patent Assertion 

General Patent 
Corporation Intl Patent 
assertion on 
contingency 

www.patentclaim.com 



Instead of making sure "o" is a valid reference, it causes an access 
violation when trying to run its invariant. I think it should check for 
null first; it makes it easier, and would do what most newbies expect. 



Fatal Exception Error? 

Fix it now! Get rid of it 
forever. Free Trial & 
Satisfaction Guarantee 

HeatthyComputerCIub.com/PCF 



Christopher E. Miller 



Thu, 18 Mar 2004 16:31:03 -0800 "Walter" <walter xx digitalmars.com> 
writes: 



"Vathix" <vathix xx dprogramming.com> wrote in message 
news:c3a8eb$304f$l xx digitaldaemon.com... 

> Object o; 

> assert(o); 

> 

> Instead of making sure "o" is a valid reference, it causes an access 

> violation when trying to run its invariant. I think it should check for 

> null first; it makes it easier, and would do what most newbies expect. 

An access violation is an exception, and if the invariant fails an exception 
is also thrown. All an access violation is is the hardware checking for the 
error rather than checking for it with software. 
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Acc. Violation logging 
Unit, Class, Method, 
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Fri, 19 Mar 2004 07:24:51 -0500 Vathix <vathix xx dprogramming.com> 
writes : 

Walter wrote: 

> "Vathix" <vathix xx dprogramming.com> wrote in message 
>news:c3a8eb$304f$l xx digitaldaemon.com... 



http://www.digitalmars.eom/d/archives/25760.html 



1/14/05 



Digital Mars - D - assert(Object) 



Page 2 of 2 



> 

»Object o; 
»assert(o); 

» 

»Instead of making sure "o" is a valid reference, it causes an access 
»violation when trying to run its invariant. I think it should check for 
»null first; it makes it easier, and would do what most newbies expect. 

> 
> 

> An access violation is an exception, and if the invariant fails an exception 

> is also thrown. All an access violation is is the hardware checking for the 

> error rather than checking for it with software. 

> 

I just mean it'd be nice if assert(o) translated into assert(o != null 
&& o.invariantO) instead of just assert(o.invariant()) so it works more 
like an if(o) statement. It's in debug mode so the extra check shouldn't 
be important. It's not a big deal, as I've gotten used to typing 
assert(o != null); - it's easier than running the program through the 
debugger. 



Christopher E. Miller 
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